Skip to main content

Connect to Amazon S3

The Amazon S3 Connector allows you to ingest documents from your Amazon S3 storage directly into your project.

Prerequisites

  • An AWS account with access to the Amazon S3 bucket you wish to ingest.

Authentication Methods

You can connect to Amazon S3 using two primary authentication methods:
  • Access Key and Secret
  • IAM Role ARN

Method 1: Connect with Access Key and Secret

Use this method if you have a direct AWS access key and secret for a user with S3 permissions.

Prerequisites for Access Key and Secret

  • Your Amazon S3 Access Key, Access Secret, Bucket Name, and Region. You can find these details within your AWS account.

Configure Data Source in Airia

  1. Select Amazon S3 Data Source
    • Navigate to the Data Sources section of your project.
    • Click Add data source and select Amazon S3 from the available library.
  2. Provide Connection Details
    • Choose the Access Key and Secret authentication method.
    • Fill in the required details:
      • Access Key: Your AWS access key.
      • Access Secret: Your AWS secret access key.
      • Bucket Name: The name of the S3 bucket you want to ingest from.
      • Region: The geographic location where your S3 bucket is physically stored (e.g., us-east-1).
    ⚠️ Warning: Double-check your Access Key, Access Secret, Bucket Name, and Region for any typographical errors.
  3. Monitor Ingestion Status
    • Once you provide your connection details, the page will refresh to display the ingestion status.
    • You can view the current ingestion status by clicking on the data source again. The detailed list will show all ingested files.

Method 2: Connect with IAM Role ARN

This method provides a more secure way to access S3 buckets, especially when connecting across different AWS accounts (e.g., your Airia environment’s AWS account assuming a role in a customer’s AWS account). It leverages an IAM Role ARN and an External ID.

Prerequisites for IAM Role ARN

  • Access to your AWS account (Customer AWS Account) to create IAM roles and policies.
  • The Airia AWS Account ID and External ID (these will be provided to you on the S3 Data Source creation page in Airia).

AWS Setup (Customer Account)

This setup requires configuration in your AWS account (Customer AWS Account) to grant access to Airia’s AWS environment.
  • Customer AWS Account: Your AWS account that owns the S3 bucket you want to ingest from.
  • Airia AWS Account (Source Account): The AWS account associated with your Airia environment, which will assume the role you create.

Step 1: Create IAM Role

  1. Navigate to IAM > Roles in your customer AWS account.
  2. Click Create Role.
  3. For the trusted entity, select Another AWS account.
  4. Enter the Airia AWS Account ID. This ID will be provided to you on the S3 Data Source creation page in Airia.
  5. Select the Require external ID checkbox, and enter the External ID. This ID will also be provided to you on the S3 Data Source creation page in Airia.
  6. Click Next: Permissions.
  7. Grant S3 access:
    • To grant read access to all S3 buckets in this account, search for and select the AWS managed policy AmazonS3ReadOnlyAccess.
    • To grant read access to a specific S3 bucket only, click Create policy (or attach an existing one) and use the following JSON:
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::customer-bucket-name"
          },
          {
            "Effect": "Allow",
            "Action": [
              "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::customer-bucket-name/*"
          }
        ]
      }
      
      Replace customer-bucket-name with the actual name of your S3 bucket.
  8. Click Next: Tags (optional), then Next: Review.
  9. Provide a meaningful Role name (e.g., AiriaS3ConnectorRole) and an optional description, then click Create role.

Step 2: Review Trust Policy

  1. After creating the role, navigate to its details page and select the Trust relationships tab.
  2. Ensure the trust policy, viewed in the JSON tab, looks similar to the following:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "<IAM Role ARN from Airia Account>"
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "external-id-here"
            }
          }
        }
      ]
    }
    
    ⚠️ Warning:
    • The Principal.AWS value must be the IAM user or role ARN from your Airia AWS account that Airia provides.
    • The sts:ExternalId must exactly match the External ID provided by Airia and used during role creation.

Step 3: Copy Role ARN and External ID

  1. From the IAM role details page, copy the Role ARN.
  2. Retrieve the External ID you used when creating the role.
  3. You will use both of these values when configuring the Amazon S3 data source in Airia.

Configure Data Source in Airia

  1. Select Amazon S3 Data Source
    • Navigate to the Data Sources section of your project.
    • Click Add data source and select Amazon S3 from the available library.
  2. Provide Connection Details
    • Choose the IAM Role ARN authentication method.
    • Fill in the required details:
      • Role ARN: The ARN of the IAM role you created in your customer AWS account.
      • External ID: The external ID used when creating the IAM role.
      • Bucket Name: The name of the S3 bucket you want to ingest from.
      • Region: The geographic location where your S3 bucket is physically stored (e.g., us-east-1).
    ⚠️ Warning: Double-check your Role ARN, External ID, Bucket Name, and Region for any typographical errors.
  3. Monitor Ingestion Status
    • Once you provide your connection details, the page will refresh to display the ingestion status.
    • You can view the current ingestion status by clicking on the data source again. The detailed list will show all ingested files.

Next Steps

After your data has been successfully ingested, the Amazon S3 data source is ready to be used with an Agent.